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Abstract: 


This  document  summarizes  the  research  performed  under  AFOSR  contract  FA9550-08- 1-0480, 
entitled  “Complex  network  information  exchange  in  random  wireless  environments.”  The  objective  of 
this  project  was  to  develop  novel  techniques,  structures,  and  algorithms  for  optimization  of  complex 
wireless  networks  where  channels  change  dynamically  and  randomly,  effecting  network  performance 
and  reliability.  These  random  dynamics,  while  challenging  for  ensuring  robust  high-performance  net¬ 
work  operation,  also  create  opportunities  that  adaptive  network  control  policies  can  exploit.  This  is 
particularly  important  for  advanced  military  networks  operating  in  rapidly  changing,  heterogeneous 
and  sometimes  hostile  environments.  A  main  focus  of  this  research  was  to  develop  the  new  technique 
of  Wireless  Network  Utility  Maximization  (WNUM).  This  technique  built  on  previous  work  in  Net¬ 
work  Utility  Maximization  (NUM),  but  extended  those  ideas  to  include  network  and  traffic  dynamics, 
blending  techniques  from  stochastic  optimization,  stochastic  approximation,  reinforcement  learning 
and  economics  to  yield  optimal  network  control  policies  that  adapt  to  randomly  changing  conditions. 
WNUM  addressed  the  following  questions:  what  are  the  important  network  control  variables;  what 
are  the  critical  flows  of  control  information;  what  are  the  optimal  control  policies?  The  research  also 
explore  optimization  of  network  security  protocols  that  exploit  random  wireless  environments  for  se¬ 
curity  purposes.  Finally,  compressed  sensing  and  matrix  completion  ideas  were  explored  to  develop 
low-complexity  network  control  policies  based  on  sparse  approximations  of  the  network  state. 

Summary  of  Results: 

During  the  course  of  the  project,  we  have  obtained  the  following  key  results:  1)  We  developed 
WNUM  techniques  to  optimize  the  reliability  and  throughput  tradeoffs  of  networks  operating  in  ran¬ 
dom  wireless  environments,  in  particular  with  respect  to  adaptive  modulation;  2)  We  developed  multi¬ 
period  NUM  (MPNUM)  techniques  to  find  optimal  control  policies  in  dynamic  environments,  taking 
into  account  time-sensitivity  of  traffic;  We  also  developed  learning  methods  for  WNUM  and  MPNUM 
where  the  statistics  of  the  environment  are  unknown,  and  must  be  learned;  3)  We  developed  physical- 
layer  security  protocols  for  relay  networks  that  embed  security  into  the  transmission  strategy;  4)  We 
have  developed  reduced-complexity  control  techniques  for  complex  wireless  networks  based  on  sparse 
approximation,  whereby  the  full  network  state  is  approximated  from  a  small  set  of  samples.  Details  of 
our  results  in  each  of  these  areas  are  given  in  the  next  section. 

The  work  has  resulted  in  9  conference  papers  and  four  journal  papers. 

Detailed  Results: 

1.  Reliability  and  Throughput  Tradeoffs  with  Wireless  Network  Utility  Maximization 

In  this  effort,  we  focused  on  a  distributed  algorithm  for  optimizing  the  rate-reliability  tradeoff 
in  wireless  networks  where  the  physical  channel  has  a  randomly  time- varying  characteristic  [5]. 
The  idea  is  to  develop  an  online  distributed  algorithm  to  find  optimal  control  policies  for  network 
link  power,  rate  and  reliability  in  wireless  environments,  based  on  stochastic  approximation. 
Utilizing  a  stochastic  version  of  dual  decomposition,  we  develop  an  algorithm  that  learns  the 
channel  characteristics  and  converges  to  optimal  policies  under  a  broad  set  of  conditions.  The 
proposed  algorithm  does  not  need  to  know  the  distribution  of  the  channel  states  in  advance  and 
learns  it  along  the  way  by  sampling  channel  conditions  and  updating  the  policy  accordingly. 
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In  particular,  consider  M  logical  source/destination  pairs  and  L  links  in  the  network.  Each 
source  and  destination  pair  is  associated  with  an  upper  layer  protocol  stack  and  the  routing 
of  information  flows  over  links  is  described  by  the  routing  matrix  A,  where  A(l,m )  =  1  if 
information  on  flow  m  traverses  link  l  and  is  otherwise  zero.  For  the  m’th  data  session,  rm 
denotes  the  rate  of  information  sent  into  the  link  encoder.  The  ratio  of  the  total  number  of  useful 
information  bits  to  the  total  number  of  bits  exiting  the  encoder  per  unit  time  is  termed  the  code 
rate  0  <  9i  <  1.  Encoded  bits  are  removed  from  the  link  buffer  and  transmitted  by  the  wireless 
link  at  rate  /?/ .  Also  for  the  channel  condition,  consider  the  channel  matrix  G  e  RixL,  where 
Gij  is  the  power  gain  from  the  transmitter  on  link  j  to  the  receiver  on  link  i.  The  vector  of 
transmitter  powers  is  given  by  S_  €  RL.  Each  transmitter,  say  at  link  /,  has  an  average  power 
budget  S[.  Also,  the  error  probability  of  bits  flowing  over  the  link  is  defined  as  X '(d).  This 
error  probability  is  a  nondecreasing  function  of  the  code  rate  9  for  any  useful  code.  We  used  the 
following  model  in  our  calculations: 

X(0)  =  (1) 

where  N  is  the  code  block  length  used  by  the  encoder  and  R0  is  the  cutoff  rate.  Then,  the 
reliability  of  an  information  flow  m  is  defined  by  0rn  as  follows  : 

0  =  1  ~ATX{6)  (2) 

where  ATX_(9 )  is  the  sum  of  the  error  rates  on  the  links  traversed  by  the  flow. 

The  performance  of  upper  layer  protocols  are  modeled  as  utility  functions.  Each  source  m  has 
a  utility  function  U(rm,  0m ) .  Utility  functions  are  strictly  concave  increasing  functions  of  the 
information  rate  and  information  reliability.  In  this  work,  we  use  the  following  parameterized 
family  of  utility  functions: 

U (r,  (f>)  =  /3  log  r  +  (1  —  0)  log  0  (3) 

where  0  <  f3  <  1  weights  the  relative  importance  of  information  rate  and  reliability. 

The  system  can  adapt  to  changing  channel  conditions  by  estimating  G  and  adapting  parameters 
such  as  transmit  power  S  =  5(G),  transmitter  link  rate  R  =  11(3  (G),  G ),  the  information  rate 
r  =  r{G),  code  rate  0(G)  and  information  reliability  o(G).  Given  the  above  definitions,  the 
following  is  the  problem  formulation  to  obtain  the  optimal  rate,  reliability,  and  power.  The  goal 
here  is  to  find  adaptive  rate  vector  r(G),  reliability  vector  <fi(G),  and  power  vector  5(G),  which 
maximize  the  average  utility  of  the  network,  under  constraints  on  information  rates,  link  rates, 
reliability,  and  average  power  transmitted,  in  the  following  sense 


Maximize  :  E  "jP  Urn  (rm  (G) ,  (prn  (G) ) 

(4) 

m  _ 

Subject  to:  E[5Z(G)]<5/  l  e  {1, 2,  ...L}, 

(5) 

E  [Ar\  <  E  [Diag(0(G) )R(S (G) ,  G)] , 

(6) 

E  [0(G)]  <1-E[AtX(9(G))\, 

(7) 

0  <  9(G)  <  1, 

(8) 

0  <  </>(G)  <  1, 

(9) 
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where  E  is  the  expectation  operator  and  optimization  variables  are  S_(G),  r(G),  9(G),  MG). 

The  main  algorithm  that  solves  the  aforementioned  problem  can  be  presented  now. 

(Primal-Dual  iteratitive  solution  for  W-NUM):  For  the  design  parameters  specified  above, 
consider  the  following  coupled  iterative  equations,  where  t  is  a  non-negative  integer  counter: 

Initialization :  Initialize  all  the  parameters  with  a  random  feasible  point. 

Primal  step : 


Si  =  Si-1  +  at  I  A c,es«  ‘  +  A 


'7 T.l  ' 


d  Ri(S,G) 
dSi 


rL  =  fii 1  +  (  P  ~  V 

1=1 


A(l,  m)erri 


A(l,:)r 

=  1  +  at  (1  —  P  +  A 0,m) 

6\  =  §j-‘  +ai(M+Y)  m)X(0 ,) 


b* 

m 


±.t  _  ±.t— 1  i  _  /  '  \  t 

ni  —  ni  +  at[  ~[=i  ~ 


A 


*-i 


7T, 


(10) 

(11) 

(12) 

(13) 

(14) 


Where  {<r*}£L0  is  a  positive,  square  summable  but  not  summable  sequence  and  Ft  '=  {(/,  m) 
A(l,  m )  7^  0}  is  the  set  of  all  the  flows  traversing  from  link  1. 


Price  update : 


\t 

As,i  ~ 


AS1 +  <7*  e* 


& 


At,,= 


A^  +  ^K 


(t-i) 


-f? 


(t-i) 


(5,G) 


(15) 

(16) 


A*,=  K;' + ^  H(MM)  r^) 

log (0i)  -  log(vr}<_1) 


(17) 


\rn  +  °*  (M-l  +  A(i,m)TX(d^)) 


(18) 


def 

Where  for  any  real  s,  [s]+  =  max{s,  0}  and  A  =  [A s,  M,  Xq,  A„-]  is  the  vector  of  Lagrange 
multipliers,  associated  with  the  constraints  in  W-NUM  formulation. 


We  have  studied  the  convergence  of  this  algorithm  where  we  proved  that  the  output  is  the  global 
optimum  of  the  aforementioned  W-NUM  problem.  We  also  showed  that  different  steps  of  this 
algorithm  can  be  calculated  based  on  local  information  and  hence  it  can  be  implemented  in  a 
distributed  fashion.  In  the  future,  we  will  further  modify  and  apply  the  above  control  framework 
to  unreliable  networks  under  attacks. 
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2.  Multi-Period  WNUM  (MPNUM):  MPNUM  captures  the  time  sensitivity  of  wireless  traffic  by 
introducing  a  new  class  of  utility  functions  and  reformulating  the  problem  as  an  infinite  horizon 
average  cost  Markov  decision  problem  (MDP).  Our  MPNUM  work  thus  developed  theoretical 
methods  to  find  optimal  control  policies  in  dynamic  environments  when  stochastic  properties 
are  well  described.  We  applied  these  techniques  to  adaptive  modulation  and  power  control 
and  compared  its  performance  against  algorithms  such  as  water-filling  that  do  not  take  traffic 
statistics  into  account  [6,  8].  We  also  examine  complex  utility  functions  that  involve  the  ratio 
of  stochastic  network  parameters  in  [9].  We  further  extended  MPNUM  to  wireless  environ¬ 
ments  when  stochastic  properties  are  unknown  using  reinforcement  learning  techniques,  such  as 
Least  Squares  Temporal  Dynamic  Learning  (LSTD-Learning).  Periodic  NUM(PNUM)  extends 
WNUM  to  cyclo- stationary  wireless  or  wireline  traffic,  and  like  MPNUM  yields  adaptive  on  line 
control  policies  to  control  rate  power  and  queuing  delay  [11], 

MPNUM  [6, 12, 13]  uses  time  smoothed  utility  functions  to  model  the  upper  layer  performance  of 
data  flows  through  a  wireless  network  as  functions  of  the  time  averaged  rate  at  which  packets  are 
injected  into  the  network.  The  time  averaging  serves  two  purposes.  First,  it  captures  differences 
in  the  characteristics  of  different  data  types,  and  second  it  reflects  the  observation  that  upper 
layer  protocols  often  operate  at  longer  time  scales  than  those  used  by  the  physical  layer.  The 
associated  MDP  is  of  the  form 


lim  —  E 

N—>oc  N 


N—l 


t= 0  m 


(19) 


where  Um(r^J  is  the  time  smoothed  utility  function  of  the  mth  data  flow  and  r1)  )  are 

the  resources  required  to  support  traffic  carried  by  the  network.  Finding  analytical  solutions 
to  this  equation  is  very  challenging  and  our  work  in  this  first  phase  of  MPNUM  focused  on 
characterizing  the  optimal  control  policies  and  exploring  different  numerical  techniques. 

In  the  second  phase  of  MPNUM,  we  explored  ways  to  find  basis  functions  to  drive  LSTD- 
Learning.  There  does  not  yet  exist  a  general  method  to  find  these  basis  functions,  so  our  approach 
used  a  novel  new  method  based  on  finding  solutions  to  differential  equations  derived  from  (19), 
as  reported  in  [13].  In  addition,  our  analysis  found  wireless  networks  modeled  in  this  way  exhibit 
state  space  collapse.  The  notion  of  state  space  collapse  comes  from  the  heavy  traffic  theory  of 
stochastic  networks.  In  this  context,  a  reduction  in  dimension  is  obtained  through  a  separation 
of  time-scales,  much  like  in  singular  perturbation  analysis  in  dynamical  systems  and  Markov 
chains. 


3.  Physical-Layer  Network  Security  In  this  work  we  considered  the  design  and  analysis  of  se¬ 
curity  protocols  that  exploit  the  physical  properties  of  the  wireless  medium  for  relay  networks. 
We  study  a  half-duplex  relay  network  in  which  communication  between  a  source  node  and  a 
destination  node  is  assisted  by  a  relay  station,  while  a  passive  eavesdropper  can  overhear  radio 
signals  from  all  legitimate  nodes.  All  communication  parties  including  the  eavesdropper  can  be 
equipped  with  multiple  antennas. 

One  of  the  key  discoveries  of  this  body  of  work  is  that  it  is  possible  to  achieve  secrecy  rates  that 
grow  linearly  in  the  SNR  in  some  very  challenging  scenarios,  in  which  the  eavesdropper  has 
more  antennas  than  all  the  legitimate  nodes  [4].  The  main  insight  is  to  let  the  legitimate  nodes 
suitably  cooperate  so  that  they  can  have  a  collective  advantage  over  the  attacker  despite  their 
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individual  disadvantages.  From  a  designers  viewpoint,  the  main  lesson  learnt  is  to  let  the  source 
and  the  destination  nodes  alternately  jam  the  eavesdropper  in  the  different  phases  of  the  relaying 
protocols. 

We  also  find  that  to  achieve  high-rate  secure  communication,  it  suffices  for  the  relay  station  to 
employ  off-the-shelf  protocols  such  as  amplify-and-forward  and  compress-and-forward  relaying 
[14].  Furthermore  the  operation  at  the  relay  station  does  not  require  any  explicit  information 
about  the  channel  state  of  the  eavesdropper.  In  practice,  this  finding  implies  that  low-cost  low- 
complexity  relay  nodes  with  no  secure  encoding  and  decoding  can  be  used  with  little  loss  in 
performance. 

4.  Structure-Based  Learning  in  Wireless  Networks  via  Sparse  Approximation  In  this  work,  a 
novel  framework  for  the  online  learning  of  expected  cost-to-go  functions  characterizing  wireless 
networks  performance  is  proposed.  The  work  is  motivated  by  the  fact  that  physical  network 
states  can  change  drastically  after  a  WMD  attack  and  such  changes  must  be  quickly  assessed 
for  optimizing  communication  and  control  schemes.  The  proposed  framework  is  based  on  the 
observation  that  wireless  protocols  induce  structured  and  correlated  behavior  of  the  Finite  State 
Machine  (FSM)  modeling  the  operations  of  the  network.  As  a  result,  a  significant  dimension 
reduction  can  be  achieved  by  projecting  the  cost-to-go  function  on  a  graph  wavelet  basis  set 
capturing  typical  sub- structures  in  the  graph  associated  with  the  FSM.  Sparse  approximation 
with  random  projection  is  then  used  to  identify  a  concise  set  of  coefficients  representing  the 
cost-to-go  function  in  the  wavelet  domain.  This  Compressed  Sensing  (CS)  approach  enables  a 
considerable  reduction  in  the  number  of  observations  needed  to  achieve  an  accurate  estimate  of 
the  cost-to-go  function. 

Specifically,  the  network  is  modeled  as  a  FSM  whose  state  evolves  within  the  state  space  5 
with  iV  =  |5|.  Define  5(i)e5  as  the  state  of  the  FSM  at  time  t— 0, 1,  2, ....  We  assume  that  the 
sequence  S={5(0),  5(1),  5(2), . . .}  is  a  Markov  process  with  transition  probabilities 

p(s,s')  =  V(S(t+l)=s'\S(t)=s),  (20) 

where  P(-)  denotes  the  probability  of  an  event.  The  performance  of  the  network  is  measured  by 
a  function  c(s,  s')  that  assigns  a  positive  and  bounded  cost  to  the  transition  from  state  s  to  state 
s'.  The  average  cost  from  state  s  is 

c(s)=5s/G<s[c(s,  s')]  =  Y  p(s,  s')c(s ,  s').  (21) 

s'es 


The  function 


c(5(t))  =  c(S(t))+E 


Y^  rc(5(f +  r)) 


T  —  1 


(22) 


where  E[-]  denotes  expectation  and  yG(0,  1)  is  the  discount  factor,  is  the  expected  discounted 
long-term  cost.  This  function  is  also  known  as  the  cost-to-go  function  and  is  central  to  DP  and 
optimal  control.  For  any  fixed  S(t)—s&S,  the  function  c(-)  is  independent  of  the  time  index  t 
and  can  be  rewritten  as 

OO 

c(s)  =  c(s)+  Y  Y  7V(S>  s')c(s'),  (23) 

s'S<S  T= 1 
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where 


pT(s,s')  =  V(S(t+T)=s'\S(t)=s)  (24) 

is  the  r-step  transition  from  state  s  to  s'.  We  propose  an  algorithm  for  the  online  learning  of 
cost-to-go  functions  in  wireless  networks  from  the  observation  of  a  state-cost  trajectory  of  the 
associated  FSM. 

The  algorithm  is  composed  of  three  elements: 

•  observation:  the  transition  probabilities  and  cost  function  c(-)  are  estimated  by  observing 
a  state-cost  sample-path; 

•  projection:  c  is  projected  onto  a  graph  wavelet  basis  set  capturing  typical  structures  in  the 
graph; 

•  sparse  estimation  of  c:  a  sparse  estimation  algorithm  is  used  to  identify  a  concise  set  of 
basis  functions  providing  the  best  fit  with  the  estimated  transition  probabilities  and  cost 
function. 

We  define  the  N  x  N  matrix  P  to  be  the  probability  transition  matrix  where  P[s,  s']=p(s,  s')  as 
in  Eq.  (20).  The  long-term  cost  c  can  be  rewritten  as 


OO 

c  =  c  +  7rPrc  =  c  +  yPc. 

T— 1 


(25) 


Thus,  c  can  be  computed  as  the  fixed  point  solution  c=f2(c)  of  the  operator  fl(c)=c+7Pc  . 
The  transition  matrix  P  and  cost  vector  c  are  not  known  a  priori  and  need  to  be  estimated  from 
observation.  At  time  T,  the  sample-path  O t  is  used  to  compute  the  estimates  P(T)  and  c (T)  of 
P  and  c.  We  use  the  estimator 


P(T) 


ij 


•  E t=i  HS(t)=i,s(t+i)=j) 


0 


•  EflT  im)=i)c(S(t),S(t+l)) 

o 


ElS^(S(t)=0 


if  3S(t)  =  i,t  =  0,...,T-1 
otherwise, 


if  3S{t)  =  i,t  =  0,  ...,T-1 

otherwise 


(26) 

(27) 


where  1  (•)  is  the  indicator  function. 

A  fundamental  element  of  the  proposed  framework  is  the  projection  of  the  cost-to-go  function  c 
on  a  set  of  basis  functions  capturing  the  typical  substructures  of  the  graph  at  various  time  scales. 
We  employ  the  recently  proposed  Diffusion  Wavelets  (DWs)  as  a  basis  set  for  the  projection. 
DWs  are  a  multiresolution  geometric  construction  for  the  multiscale  analysis  of  operators  on 
graphs.  DW  functions  are  computed  by  sequentially  applying  a  diffusion  operator  (for  instance, 
the  transition  matrix  P)  at  the  current  scale  k,  compressing  the  range  via  a  local  orthonormaliza¬ 
tion  procedure,  representing  the  operator  in  the  compressed  range  and  computing  the  P2fc  on  this 
range.  Functions  defined  on  the  support  space  are  analyzed  in  multiresolution  fashion,  where 
dyadic  powers  of  the  diffusion  operator  correspond  to  dilations,  and  projections  correspond  to 
downsampling.  Even  if  P  is  not  known  a  priori ,  we  assume  that  the  location  of  the  non-zero 
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elements  of  P,  that  is,  the  connectivity  structure  of  P,  is  known.  Define  I(P)=sgn(P  +  PT). 
The  basis  set  W  is  then  computed  on  P.symm  where  the  /— th  row  of  P.,l/mm  is 

[p«™»]i=ii(p)]i/X;[i(p)]«.  (28) 

3 

Define  W  as  a  diffusion  wavelet  basis  set  computed  on  P symm,  where  the  DW  functions  are 
the  columns  of  W.  We  have  then  cwWx,  where  x  is  the  representation  vector  collecting  the 
coefficients  of  the  wavelet  functions  in  W.  Given  P  and  c,  the  representation  vector  x*  providing 
the  most  accurate  approximation  of  c  on  W  minimizes  the  Bellman  residual  ||D(Wx)  —  Wx|||. 
We  have  then 

x*  =  arg  min||c  —  (I-7P)Wx||*.  (29) 

X 

The  main  idea  behind  the  proposed  estimation  paradigm  is  that  the  DW  set  of  functions  is  a  spar- 
sifying  basis  for  the  cost-to-go  function  c.  Due  to  the  structured  behavior  defined  by  networking 
protocols,  a  small  number  of  functions  can  represent  the  evolution  and,  thus,  the  collected  cost, 
from  large  groups  of  states.  The  Least  Angle  Selection  and  Shrinkage  Operator  (LASSO)  algo¬ 
rithm  minimizes  the  residual  norm  of  the  residual  plus  a  regularization  term.  For  the  considered 
problem,  the  LASSO  is  formulated  as 

x*(T)  =  argmin||R(T)c(T)  -  R(T)B(T)Wx||^  +  AHx^,  (30) 

X 

where  B(T)  =  I  — 7P(T),  R  is  a  random  matrix,  and  R(T)  is  the  submatrix  formed  by  retaining 
the  columns  of  R  indexed  by  states  hit  in  the  observation  interval  T .  We  begin  with  the  definition 
of  the  properties  we  wish  to  show. 

Definition  1  (Restricted  Isometry  Property):  The  observation  matrix  B  is  said  to  satisfy  the 
restricted  isometry  property  of  order  S  G  N  with  parameter  5s  £  (0, 1),  i.e.  RIP  (S',  5  s)  if 

(1-  5s)||x||l  <||Bx||l  <(l  +  (55)||x||l,  (31) 

holds  for  all  xe  RN  having  no  more  than  S  non-zero  entries.  Note  that  B  is  a  K  xN  matrix. 
RIP  implies  that  B  is  approximately  an  isometry  for  S-sparse  signals. 

We  have  the  following  theorem, 

Theorem  1  The  matrix  R(T)  (I  —  7P )  does  not  satisfy  RIP(  S,  5  s)  with  the  following  probability 
bound, 

V  (R(T)B  does  not  satisfy  RIP( 5s,  S))  <  exp 

ifK2>n^randc  !>#• 

This  result  states  that  if  the  number  of  observations  K  is  of  order  O  (S2  \f  n  log  n)  then  RIP  is 
satisfied  with  high  probability  as  the  network  grows  large.  We  contrast  this  with  the  more  typical 
results  seen  in  say  channel  estimation  problems  where  the  order  is  O  ( S 2  log  n).  We  remark 
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Figure  1 :  Comparison  between  the  reconstruction  error  as  a  function  of  the  time  slot  achieved  by  the 
proposed  algorithm  and  Q-learning.  All  states  hit  by  the  sample  path  are  used  in  the  estimation. 


that  our  result  on  the  RIP  is  not  limited  to  LASSO,  but  leads  to  the  more  general  conclusion 
that  sparse  estimation  algorithms  can  be  used  to  approximate  cost-to-go  functions  of  wireless 
networks. 

we  present  numerical  results  for  an  example  of  a  wireless  network  to  demonstrate  the  potential 
of  the  compressed  sensing  approach.  We  consider  a  wireless  network  where  terminals  store 
packets  in  a  finite  buffer  of  size  Q  and  employ  Automatic  Retransmission  reQuest  (ARQ)  to 
improve  the  delivery  rate  of  packets.  Time  is  divided  in  slots  of  fixed  duration. 

The  FSM  tracking  the  state  of  each  individual  terminal  is  composed  of  two  sub-chains:  a  random 
walk-like  sub-chain  tracking  the  number  of  packets  in  the  buffer  (state  space  {0, 1, . . . ,  Q})  and  a 
forward  counter-like  sub-chain  tracking  the  retransmission  index  of  the  packet  being  transmitted 
(state  space  {0, 1, ... ,  F},  where  F  is  the  maximum  number  of  transmissions  of  a  packet).  The 
FSM  tracking  the  state  of  the  overall  network  is  the  composition  of  the  FSMs  of  the  individual 
terminals.  The  cost  function  c  measures  the  normalized  cost  in  terms  of  throughput  loss  with 
respect  to  the  saturation  throughput  achieved  by  the  terminals  in  the  absence  of  interference. 

For  Q= 5  and  F=4  and  2  terminals  the  size  of  the  state  space  is  1681.  In  order  to  keep  complexity 
low,  the  columns  of  W  are  subsampled.  In  particular,  we  select  400  wavelet  functions  at  different 
time  scales.  Fig.  1  depicts  the  reconstruction  error  achieved  by  the  proposed  compressed  sensing 
based  framework  and  that  of  standard  Q-learning  as  a  function  of  the  length  of  the  observed 
sample-path.  The  proposed  algorithm  achieves  a  considerable  accuracy  in  the  estimation  of  c 
after  a  very  short  number  of  state-cost  observations,  whereas  standard  learning  converges  slowly 
to  c.  Moreover,  the  solution  is  extremely  stable  and  the  compressed  sensing  based  algorithm 
appears  to  be  very  robust  to  estimation  noise.  In  the  future,  we  will  study  the  design  of  reduced- 
dimension  network  protocols  that  recover  quickly  from  drastic  topology  changes. 
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